Threshold Exceedance Model
The stationary ThresholdExceedance model is illustrated using the daily rainfall accumulations at a location in south-west England from 1914 to 1962. This dataset was studied by Coles (2001) in Chapter 4. The daily rainfall are assume independent and identically distributed.
The Extremes.jl package supports maximum likelihood inference, Bayesian inference and inference based on the probability weigthed moments. For the GEV parameter estimation, the following functions can be used:
gpfitpwm: estimation with the probability weighted moments;gpfit: maximum likelihood estimation;gpfitbayes: Bayesian estimation.
These functions return the estimate of the log-scale parameter $\phi = \log \sigma$.
Load the data
Loading the daily precipitations:
data = Extremes.dataset("rain")
first(data,5)5 rows × 2 columns
| Date | Rainfall | |
|---|---|---|
| Date | Float64 | |
| 1 | 1914-01-01 | 0.0 |
| 2 | 1914-01-02 | 2.3 |
| 3 | 1914-01-03 | 1.3 |
| 4 | 1914-01-04 | 6.9 |
| 5 | 1914-01-05 | 4.6 |
Plotting the data using the Gadfly package:
plot(data, x=:Date, y=:Rainfall, Geom.point, Theme(discrete_highlight_color=c->nothing))
Threshold selection
A suitable threshold for the Peaks-Over-Threshold model can be chosen by examining the mean residual life plot. The mean residual life is expected to be a linear function of the threshold when the latter is high enough. The mean residual life plot can be plotted with the mrlplot function:
mrlplot(data[:,:Rainfall])
As concluded by Coles (2001, Chapter 4), a reasonable threshold is 30 mm.
threshold = 30.0Exceedances extraction
Parameter estimation of the Generalized Pareto distribution in Extremes.jl is performed using the threshold exceedances previously extracted. The support of the exceedances given in the fit function is therefore $(0,∞)$.
For the Rainfall example, let's extract the threshold exceedances.
Identify first the threshold exceedances:
threshold = 30.0
df = filter(row -> row.Rainfall > threshold, data)
first(df, 5)5 rows × 2 columns
| Date | Rainfall | |
|---|---|---|
| Date | Float64 | |
| 1 | 1914-02-07 | 31.8 |
| 2 | 1914-03-08 | 32.5 |
| 3 | 1914-12-17 | 31.8 |
| 4 | 1914-12-30 | 44.5 |
| 5 | 1915-02-13 | 30.5 |
Retrieve the exceedances above the threshold:
df[!,:Rainfall] = df[!,:Rainfall] .- threshold
rename!(df, :Rainfall => :Exceedance)
first(df, 5)5 rows × 2 columns
| Date | Exceedance | |
|---|---|---|
| Date | Float64 | |
| 1 | 1914-02-07 | 1.8 |
| 2 | 1914-03-08 | 2.5 |
| 3 | 1914-12-17 | 1.8 |
| 4 | 1914-12-30 | 14.5 |
| 5 | 1915-02-13 | 0.5 |
Maximum likelihood inference
GP parameter estimation
The Generalized Pareto maximum likelihood parameter estimates are computed with the gpfit function:
julia> fm = gpfit(df, :Exceedance)MaximumLikelihoodEVA model : ThresholdExceedance data : Vector{Float64}[152] logscale : ϕ ~ 1 shape : ξ ~ 1 θ̂ : [2.006896498380506, 0.1844926991237574]
In this example, the gpfit function is called using the data DataFrame structure as the first argument. The function can also be called using the vector of maxima as the first argument, e.g. gpfit(df[:,:Exceedance]).
The vector of the parameter estimates $\hat\mathbf{\theta} = (ϕ̂,\, ξ̂)^\top$ is contained in the field θ̂ of the structure fm:<fittedEVA.
The approximate covariance matrix of the parameter estimates can be obtained with the parametervar function:
julia> parametervar(fm)2×2 Matrix{Float64}: 0.0165972 -0.00880429 -0.00880429 0.0102416
Confidence intervals for the parameters are obtained with the cint function:
julia> cint(fm)2-element Vector{Vector{Float64}}: [1.7543940197850576, 2.2593989769759544] [-0.013857525987188757, 0.38284292423470356]
For instance, the shape parameter 95% confidence interval is as follows:
julia> cint(fm)[2]2-element Vector{Float64}: -0.013857525987188757 0.38284292423470356
Diagnostic plots
Several diagnostic plots for assessing the accuracy of the fitted GP distribution to the rainfall data are can be shown with the diagnosticplots function:
set_default_plot_size(21cm ,16cm)
diagnosticplots(fm)
The diagnostic plots consist in the probability plot (upper left panel), the quantile plot (upper right panel), the density plot (lower left panel) and the return level plot (lower right panel). These plots can be displayed separately using respectively the probplot, qqplot, histplot and returnlevelplot functions.
Return level estimation
T-year return level estimate can be obtained using the returnlevel function. Along with the fitted Generalized Pareto distribution for the threshold exceedances, the following informations:
- the threshold;
- the number of total observation;
- the number of observation per year.
are also needed for estimating the T-year return level using the POT model.
For example, the 100-year return level for the rainfall POT model is computed as follows:
julia> nobs = size(data,1)17531julia> nobsperblock = 365365julia> r = returnlevel(fm, threshold, nobs, nobsperblock, 100)ReturnLevel returnperiod : 100 value : Vector{Float64}[1]
The return level can be accessed as follows:
julia> r.value1-element Vector{Float64}: 106.32558691303024
The corresponding confidence interval can be computed with the cint function:
julia> c = cint(r)1-element Vector{Vector{Real}}: [65.4816377433487, 147.16953608271177]
In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.
To get the scalar return level in the stationary case, the following command can be used:
julia> r.value[]106.32558691303024
To get the scalar confidence interval in the stationary case, the following command can be used:
julia> c[]2-element Vector{Real}: 65.4816377433487 147.16953608271177
Bayesian Inference
Most functions described in the previous sections also work in the Bayesian context.
GP parameter estimation
The Bayesian GP parameter estimation is performed with the gpfitbayes function:
julia> fm = gpfitbayes(df, :Exceedance)Progress: 9%|███▉ | ETA: 0:00:01 Progress: 19%|███████▊ | ETA: 0:00:01 Progress: 28%|███████████▌ | ETA: 0:00:01 Progress: 38%|███████████████▍ | ETA: 0:00:01 Progress: 46%|███████████████████ | ETA: 0:00:01 Progress: 58%|███████████████████████▋ | ETA: 0:00:00 Progress: 69%|████████████████████████████▏ | ETA: 0:00:00 Progress: 80%|████████████████████████████████▊ | ETA: 0:00:00 Progress: 91%|█████████████████████████████████████▍ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:01 BayesianEVA model : ThresholdExceedance data : Vector{Float64}[152] logscale : ϕ ~ 1 shape : ξ ~ 1 sim : Mamba.Chains Iterations : 2001:5000 Thinning interval : 1 Chains : 1 Samples per chain : 3000 Value : Array{Float64, 3}[3000,2,1]
Currently, only the improper uniform prior is implemented, i.e. \[ f_{(ϕ,ξ)}(ϕ,ξ) ∝ 1. \] It yields to a proper posterior as long as the sample size is larger than 2 (Northrop and Attalides, 2016).
Currently, the No-U-Turn Sampler extension (Hoffman and Gelman, 2014) to Hamiltonian Monte Carlo (Neel, 2011, Chapter 5) is implemented for simulating an autocorrelated sample from the posterior distribution.
The generated sample from the posterior distribution is contained in the field sim of the fitted structure. It is an object of type Chains from the Mamba.jl package.
Credible intervals for the parameters are obtained with the cint function:
julia> cint(fm)2-element Vector{Vector{Float64}}: [1.7246897436752568, 2.2395642679956893] [0.009316841484997451, 0.42787768643177326]
Inference based on the probability weighted moments
Most functions described in the previous sections also work for the model fitted with the probability weighted moments.
GP parameter estimation
The parameter estimation based on the probability weighted moments is performed with the gpfitpwm function:
julia> fm = gpfitpwm(df, :Exceedance)pwmEVA model : ThresholdExceedance data : Vector{Float64}[152] logscale : ϕ ~ 1 shape : ξ ~ 1 θ̂ : [1.9877399514951732, 0.19651587232938317]
The approximate covariance matrix of the parameter estimates using a bootstrap procedure can be obtained with the parametervar function:
julia> parametervar(fm)2×2 Matrix{Float64}: 0.0149629 -0.00597755 -0.00597755 0.00629113
Confidence intervals on the parameter estimates using a bootstrap procedure can be obtained with the cint function:
julia> cint(fm)2-element Vector{Vector{Float64}}: [1.7544277889819984, 2.246149718607561] [0.010424545216936299, 0.3195324601611048]